Permissive Supervisor Synthesis for Markov Decision Processes through Learning

نویسندگان

  • Bo Wu
  • Xiaobin Zhang
  • Hai Lin
چکیده

This paper considers the permissive supervisor synthesis for probabilistic systems modeled as Markov Decision Processes (MDP). Such systems are prevalent in power grids, transportation networks, communication networks and robotics. Unlike centralized planning and optimization based planning, we propose a novel supervisor synthesis framework based on learning and compositional model checking to generate permissive local supervisors in a distributed manner. With the recent advance in assume-guarantee reasoning verification for probabilistic systems, building the composed system can be avoided to alleviate the state space explosion and our framework learn the supervisors iteratively based on the counterexamples from verification. Our approach is guaranteed to terminate in finite steps and to be correct.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Permissive Finite-State Controllers of POMDPs using Parameter Synthesis

We study finite-state controllers (FSCs) for partially observable Markov decision processes (POMDPs). The key insight is that computing (randomized) FSCs on POMDPs is equivalent to synthesis for parametric Markov chains (pMCs). This correspondence enables using parameter synthesis techniques to compute FSCs for POMDPs in a black-box fashion. We investigate how typical restrictions on parameter ...

متن کامل

Utilizing Generalized Learning Automata for Finding Optimal Policies in MMDPs

Multi agent Markov decision processes (MMDPs), as the generalization of Markov decision processes to the multi agent case, have long been used for modeling multi agent system and are used as a suitable framework for Multi agent Reinforcement Learning. In this paper, a generalized learning automata based algorithm for finding optimal policies in MMDP is proposed. In the proposed algorithm, MMDP ...

متن کامل

Supervisor Synthesis of POMDP based on Automata Learning

As a general and thus popular model for autonomous systems, partially observable Markov decision process (POMDP) can capture uncertainties from different sources like sensing noises, actuation errors, and uncertain environments. However, its comprehensiveness makes the planning and control in POMDP difficult. Traditional POMDP planning problems target to find the optimal policy to maximize the ...

متن کامل

Accelerated decomposition techniques for large discounted Markov decision processes

Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...

متن کامل

Importance sampling for reinforcement learning with multiple objectives

This thesis considers three complications that arise from applying reinforcement learning to a real-world application. In the process of using reinforcement learning to build an adaptive electronic market-maker, we find the sparsity of data, the partial observability of the domain, and the multiple objectives of the agent to cause serious problems for existing reinforcement learning algorithms....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1703.07351  شماره 

صفحات  -

تاریخ انتشار 2017